Using Derivation Trees for Treebank Error Detection
نویسندگان
چکیده
This work introduces a new approach to checking treebank consistency. Derivation trees based on a variant of Tree Adjoining Grammar are used to compare the annotation of word sequences based on their structural similarity. This overcomes the problems of earlier approaches based on using strings of words rather than tree structure to identify the appropriate contexts for comparison. We report on the result of applying this approach to the Penn Arabic Treebank and how this approach leads to high precision of error detection.
منابع مشابه
Further Developments in Treebank Error Detection Using Derivation Trees
Linguistic Data Consortium University of Pennsylvania Philadelphia, PA 19104 {skulick,bies,jmott}@ldc.upenn.edu Abstract This work describes how derivation tree fragments based on a variant of Tree Adjoining Grammar (TAG) can be used to check treebank consistency. Annotation of word sequences are compared both for their internal structural consistency, and their external relation to the rest of...
متن کاملCorrecting Syntactic Annotation Errors Using a Synchronous Tree Substitution Grammar
This paper proposes a method of correcting annotation errors in a treebank. By using a synchronous grammar, the method transforms parse trees containing annotation errors into the ones whose errors are corrected. The synchronous grammar is automatically induced from the treebank. We report an experimental result of applying our method to the Penn Treebank. The result demonstrates that our metho...
متن کاملParse disambiguation for a rich HPSG grammar
The fine-grained nature of the HPSG representations found in the Redwoods treebank raises novel issues in parse disambiguation relative to more traditional treebanks such as the Penn treebank, which have been the focus of most past work on probabilistic parsing (e.g., Charniak 1997; Collins 1997). The Redwoods treebank is much richer in the representations it makes available. Most similar to Pe...
متن کاملCorpus-Oriented Grammar Development for Acquiring a Head-Driven Phrase Structure Grammar from the Penn Treebank
This paper describes a method of semi-automatically acquiring an English HPSG grammar from the Penn Treebank. First, heuristic rules are employed to annotate the treebank with partially-specified derivation trees. Lexical entries are automatically extracted from the annotated corpus by inversely applying schemata to partially-specified derivation trees.
متن کاملExtraction de PCFG et analyse de phrases pré-typées (PCFG Extraction and Pre-typed Sentences Analysis) [in French]
PCFG Extraction and Pre-typed Sentences Analysis This article explains the way we extract a PCFG from the Paris VII treebank. Firslty, we need to transform the syntactic trees of the corpus into derivation trees. The transformation is done with a generalized tree transducer, a variation of the usual top-down tree transducers, and gives as result some derivation trees for an AB grammar. Secondel...
متن کامل